Conceptual Vector Learning - Comparing Bootstrapping from a Thesaurus or Induction by Emergence
نویسنده
چکیده
In the framework of the Word Sense Disambiguation (WSD) and lexical transfer in Machine Translation (MT), the representation of word meanings is one critical issue. The conceptual vector model aims at representing thematic activations for chunks of text, lexical entries, up to whole documents. Roughly speaking, vectors are supposed to encode ideas associated to words or expressions. In this paper, we first expose the conceptual vectors model and the notions of semantic distance and contextualization between terms. Then, we present in details the text analysis process coupled with conceptual vectors, which is used in text classification, thematic analysis and vector learning. The question we focus on is whether a thesaurus is really needed and desirable for bootstrapping the learning. We conducted two experiments with and without a thesaurus and are exposing here some comparative results. Our contribution is that dimension distribution is done more regularly by an emergent procedure. In other words, the resources are more efficiently exploited with an emergent procedure than with a thesaurus terms (concepts) as listed in a thesaurus somehow relate to their importance in the language but not to their frequency in usage nor to their power of discrimination or representativeness.
منابع مشابه
Extracting Semantic Taxonomies of Nouns from a Korean MRD Using a Small Bootstrapping Thesaurus and a Machine Learning Approach
متن کامل
Neural Learning of Embodied Interaction
This paper presents our approach towards realizing a robot which can bootstrap itself towards higher complexity through embodied interaction dynamics with the environment including other agents. First, the elements of interaction dynamics are extracted from conceptual analysis of embodied interaction and its emergence, especially, of behavioral imitation. Then three case studies are made, prese...
متن کاملAutomatic Retrieval and Clustering of Similar Words
Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the the...
متن کاملUnsupervised selection of semantic relations for improving a distributional thesaurus (Sélection non supervisée de relations sémantiques pour améliorer un thésaurus distributionnel) [in French]
Unsupervised selection of semantic relations for improving a distributional thesaurus Work about distributional thesauri has shown that the relations in these thesauri are mainly reliable for high frequency words. In this article, we propose a method for improving such a thesaurus through its re-balancing in favor of low frequency words. This method is based on a bootstrapping mechanism : a set...
متن کاملLearning Thesaurus Relations from Distributional Features
In distributional semantics words are represented by aggregated context features. The similarity of words can be computed by comparing their feature vectors. Thus, we can predict whether two words are synonymous or similar with respect to some other semantic relation. We will show on six different datasets of pairs of similar and non-similar words that a supervised learning algorithm on feature...
متن کامل